Fragmenting XML Documents via Structural Constraints
نویسندگان
چکیده
XML query processors suffer from main-memory limitations that prevent them from processing large XML documents. While content-based predicates can be used to project down parts of the documents, it may still be needed to resize the obtained projections according to structural constraints. In this paper, we consider size, tree-width and tree-depth constraints to enable a structuredriven fragmentation of XML documents. Although a set of heuristics performing this kind of fragmentation can be easily devised, a key problem is determining the values of structural constraints input to the above heuristics, given that the search space is prohibitive at large. To alleviate the problem, we introduce special-purpose structure histograms that report the constraint values for the fragments of a given document. We then present a prediction algorithm that probes those histograms to output the expected number of fragments, when fixed input values of the constraints are used. Furthermore, we study how to relax the fixed constraints by means of classical distributions. An experimental evaluation of our study shows the effectiveness of our fragmentation methodology on some representative XML datasets.
منابع مشابه
خوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملStructural Similarity Evaluation Between XML Documents and DTDs
The automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web. Nonetheless, several operations based on the structure of XML data have not yet received strong attention. Among these is the process of matching XML documents and XML grammars, useful in various applications such as documents classifi...
متن کاملDistribution Design for XML documents
The web is often seen as the world's largest database and XML is regarded to provide its data model. As XML data is naturally distributed across the web it should be considered as a distributed database and subject to distribution design. The main tasks of distribution design are fragmenting the underlying database schema and allocating the fragments to different sites. The aim of fragmentation...
متن کاملQMatch - Using paths to match XML schemas
Integration of multiple heterogeneous data sources continues to be a critical problem for many application domains and a challenge for researchers world-wide. With the increasing popularity of the XML model and the proliferation of XML documents on-line, automated matching of XML documents and databases has become a critical issue. In this paper, we present a hybrid schema match algorithm, QMat...
متن کاملGenerating XML structure using examples and constraints
This paper presents a framework for automatically generating structural XML documents. The user provides a target DTD and an example of an XML document, called a Generate-XML-ByExample Document, or a GxBE document, for short. GxBE documents use a natural declarative syntax, which includes XPath expressions and the function count. Using GxBE documents, users can express important global and loca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006